Zip files were downloaded from the following sites. The data files were already in a neat csv format.
Data processing included the following:
Data was downloaded from IPUMS using their interactive data puller. The time period for the data is 2005-2016 as those are the years that provide county FIPS codes. The following variables were used:
Data was aggregated up to the county-level using weighted statistics according to the person weight variable.
ACS data from IPUMS USA, University of Minnesota, www.ipums.org
Data was downloaded from Factiva in 100 article chunks. The search parameters were as follows:
3,013 results were found, and the raw data was downloaded in rtf format and converted to raw text using the striprtf package in Python. This data is then cleaned, tokenized, stemmed, and stop words removed using nltk. Sentiment is calculated using nltk VADER sentiment. TF-IDF analysis is performed using the nltk package. Tensorflow is used to perform bi-directional LSTM neural network analysis to predict news publication based on cleaned tokenized text.
Association rules are a method for showing IF-THEN correlations. This network graphs shows ACS demographic variables and rules that are correlated with U.S. counties picking the Democratic or Republican candidate in the 2016 election. Shading of the rules indicates the degree of dependence among the variables for that rule. For background on the metrics, read here